feat(Segment membership inspection PoC): Daily ClickHouse-backed per-env segment counts#7464
Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
The latest updates on your projects. Learn more about Vercel for GitHub. 3 Skipped Deployments
|
Docker builds report
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7464 +/- ##
==========================================
+ Coverage 98.39% 98.50% +0.10%
==========================================
Files 1400 1430 +30
Lines 52892 54194 +1302
==========================================
+ Hits 52044 53384 +1340
+ Misses 848 810 -38 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This comment was marked as low quality.
This comment was marked as low quality.
Visual Regression19 screenshots compared. See report for details. |
c6e464b to
bff85b4
Compare
…ip-counts # Conflicts: # .github/actions/codeartifact-login/action.yml # .github/workflows/.reusable-docker-build.yml # .github/workflows/api-deploy-production-ecs.yml # .github/workflows/api-tests-with-private-packages.yml # api/pyproject.toml # api/uv.lock
Picks up the test-tools plugin fix for `Counter.clear()` on parameterless metrics, which was breaking the segment-membership backfill tests. beep boop
…ifact flagsmith-sql-flag-engine 0.1.0a2 is published to the prod CodeArtifact PyPI index. Add codeartifact-login to the workflows that install api deps without it (api-pull-request, api-run-makefile-target, update-flagsmith-environment) and to the OSS Docker build paths, since they now resolve a private dep through CodeArtifact. The Dockerfile build-python and api-test stages now consume the prod codeartifact secret to fetch the same package. beep boop
Mark every spot where workflow/Dockerfile/pyproject changes exist solely because flagsmith-sql-flag-engine is on the prod CodeArtifact PyPI, so they can be unwound once the package is open-sourced. beep boop
Add a `CLICKHOUSE_URL` setting that mirrors the `DATABASE_URL` pattern — one DSN env var instead of six discrete `CLICKHOUSE_*` fields. The existing per-field settings remain as the fallback path so deployments that prefer overrides keep working. `clickhouse_connect.get_client` parses the DSN itself, but its per-field kwargs take precedence over parsed values (`port = port or parsed.port`), so when the URL is set the discrete CLICKHOUSE_* settings are not passed through — the DSN drives every field. beep boop
…etting Replace the `is_clickhouse_configured()` service helper with a declarative `settings.CLICKHOUSE_ENABLED`, computed once at settings load from `CLICKHOUSE_URL or CLICKHOUSE_HOST`. Tasks and the migration now short-circuit on that flag. Drop the if/else around `clickhouse_connect.get_client` and pass both `dsn=` and the discrete `CLICKHOUSE_*` kwargs unconditionally, letting the library merge them via its `arg or parsed.<field>` resolution. The discrete defaults move to None so an unset override doesn't accidentally clobber the value parsed from the DSN. beep boop
Collapse multi-paragraph docstrings down to the gist, drop mechanism re-explanations that already live in one place, and shrink the CLICKHOUSE settings block to match neighbouring style. beep boop
for more information, see https://pre-commit.ci
Line references drifted after the docstring/comment trims. Regenerated via `make generate-docs`. beep boop
emyller
left a comment
There was a problem hiding this comment.
Looks thorough, thanks for making reviewing comfortable. I have a few performance-related questions, and some design concerns to cover.
Point every CodeArtifact-related TODO at Flagsmith/flagsmith-sql-flag-engine#6 so the cleanup trigger is discoverable from the call site. beep boop
The model caches per-(segment, environment) identity-match counts, but the `SegmentMembership` name suggested a wrapper keyed by identity. Rename so the entity is self-describing, and rename the `memberships` reverse relation + API field to `membership_counts` to match. beep boop
Swap clickhouse-connect for django-clickhouse-backend so the IDENTITIES schema can be bootstrapped through manage.py migrate --database=clickhouse rather than a RunPython migration gated by CLICKHOUSE_ENABLED. The original sequencing trap (install without CH then enabling CH later left IDENTITIES uncreated) goes away. Connection settings now live in DATABASES["clickhouse"]. Backfill and refresh use connections["clickhouse"].cursor() directly; bulk inserts go through cursor.executemany, which clickhouse-driver collapses into a single native Block. A new top-level clickhouse app holds the schema migration; ClickHouseRouter fences it off the default Postgres alias whether or not CH is configured. beep boop
…tifier) (environment_id, identifier) is the natural unique key in Flagsmith's identity model, so ReplacingMergeTree can dedupe on it directly without a synthetic id column. Drops the truncated-MD5 hash and the CodeQL finding that went with it, eliminates collision risk at SaaS scale, and removes the column entirely from the schema, mapper, and INSERT. beep boop
… iteration Project.id IN (subquery) doesn't care about duplicates in the IN list, so the inner .distinct() on the project_ids subquery is wasted work. Drop it and stream the projects via .iterator() so we don't materialise every FoF-enabled SaaS project at once. beep boop
…l completes The previous shape collected every project's id during the backfill loop and fanned out refresh tasks in one burst at the end of the daily run — a queue-depth spike at SaaS scale. Move the dispatch inside the project loop so refresh enqueues track the backfill cadence (one per project as its cursor closes) and naturally spread across the backfill window. beep boop
…kstop "Runaway queries" overstated the concern — we don't have a known class of queries to guard against. The actual reason for the 30m bound is to free the task-processor slot when the CH client or dispatcher hangs. beep boop
Standardise toward the convention used elsewhere in tests/integration: inline f-string URL paths over Django's reverse(). Easier to scan when reviewing a test and one less import. beep boop
The Given/When/Then prose was restating what each line already does. Drop it back to terse markers (combining # Given / When or # When / Then on shorter tests), keeping only the comments that explain *why* — e.g. the FINAL clause forcing ReplacingMergeTree dedup, log_comment landing in query_log. beep boop
A literal ISO timestamp on both sides of the assertion reads "dumber" than a datetime computed then reformatted, which is the preferred shape in tests. beep boop
30m was a comfortable buffer, not a measured one. A single project refresh is one UNION ALL aggregation; 10m is ~2x the legitimate ceiling and lets us reclaim zombies inside one backfill cadence instead of letting them linger. Widen on real data if this starts false-firing. beep boop
emyller
left a comment
There was a problem hiding this comment.
Thanks for addressing all comments. This looks good to ship!
Bumps flagsmith-sql-flag-engine to the public 0.1.0 release and removes the CodeArtifact source override plus every CI/Dockerfile workaround that existed only because the package was private. CodeArtifact wiring remains in place where it serves flagsmith-private. beep boop
matthewelwell
left a comment
There was a problem hiding this comment.
Approving based on evandro's previous review and minor diff since.
Thanks for submitting a PR! Please check the boxes below:
docs/if required so people know about the feature.Changes
Contributes to #5663.
Adds a daily pipeline that backfills Dynamo identities into ClickHouse, materialises per-(segment, environment) match counts via
flagsmith-sql-flag-engine(ClickHouseDialect), and exposes them on the segment endpoint asmemberships: [{environment, count, last_synced_at}]for env-dropdown badges. Gated behind the org-scopedsegment_membership_inspectionFoF flag; no-ops whenCLICKHOUSE_HOSTis unset.History note: the PoC originally landed against Snowflake. After the 12/05/2026 RFC review, the cost analysis (and a stacked engine PR adding the ClickHouse dialect) tipped us toward ClickHouse for V1.
Storage decisions:
ReplacingMergeTree(inserted_at) ORDER BY (environment_id, id). Daily INSERTs dedupe at merge time (most-recentinserted_atwins); refresh queries useFROM IDENTITIES AS i FINALto force read-time dedup so counts always reflect the latest backfill.JSONcolumn (CH 24+; GA in 25.x). Each key becomes a typed columnar subcolumn — same cost shape as Snowflake's VARIANT.schema_ddlstaysMergeTreeas the simplest correct shape; this PoC overrides for its mutation-heavy workload. Translator output is engine-agnostic so the override is invisible to the SQL flag engine.Spend attribution:
clickhouse-connectopens a session with alog_commentsetting per task (flagsmith:segment_membership:{backfill,refresh}:org_X:project_Y), landing insystem.query_logfor per-(org, project) rollups — direct analogue of Snowflake'sQUERY_TAG.Review complexity: 4/5
Review order recommendation:
models.py(cache table)services.py(compile + count, parameterised SQL withFINAL)tasks.py(daily recurring backfill fans out per-project refresh)mappers.py(Dynamo doc → IDENTITIES row)migrations/0002_*(ClickHouseReplacingMergeTreeDDL viaRunPython, no-op when unconfigured)segments/serializers.py+views.py(read-sidemembershipsfield, prefetched).How did you test this code?
Beyond the existing unit + integration tests:
Ran an end-to-end smoke test against
moto-mocked Dynamo + a real ClickHouse Cloud trial (eu-west-2) configured via env vars. The flow:plan IN (growth, enterprise)rule).make docker-up django-migrate— migrations created theSegmentMembershiptable in core Postgres and theIDENTITIESschema in ClickHouse Cloud.moto.mock_dynamodband seeded 50 synthetic identities — 25 withtraits.plan = "growth", 25 withtraits.plan = "free".backfill_identities_to_clickhouse()directly — confirmed all 50 rows landed in ClickHouse'sIDENTITIEStable with the per-(org, project)log_commentattribution attached.refresh_project_segment_counts(project_id)(inline-dispatched from the backfill's fan-out) — confirmed exactly oneSegmentMembershiprow materialised withcount = 25and a freshlast_synced_at.Extensive testing will be done on staging.